Segmentation of Unstructured Newspaper Documents
نویسندگان
چکیده
منابع مشابه
Arabic Newspaper Page Segmentation
The aim of layout analysis is to extract the geometric structure from a document image. It consists of labeling homogenous regions of a document image. This paper describes the performance of segmentation algorithms and their adaptation in order to treat complex structured Arabic documents such as newspapers. Experimental tests have been carried out on four different phases of newspaper image a...
متن کاملAnonimytext: Anonimization of Unstructured Documents
The anonymization of unstructured texts is nowadays a task of great importance in several text mining applications. Medical records anonymization is needed both to preserve personal health information privacy and enable further data mining efforts. The described ANONYMITEXT system is designed to de identify sensible data from unstructured documents. It has been applied to Spanish clinical notes...
متن کاملOntology-Based Semantic Classification of Unstructured Documents
As more and more knowledge and information becomes available through computers, a critical capability of systems supporting knowledge management is the classification of documents into categories that are meaningful to the user. In a step beyond the use of keywords, we developed a system that analyzes the sentences contained in unstructured or semi-structured documents, and utilizes an ontology...
متن کاملSegmentation of Compressed Documents
We present a novel technique for segmentation of a JPEGcompressed document based on block activity. The activity is measured as the number of bits spent to encode each block. Each number is mapped to a pixel brightness value in an auxiliary image which is then used for segmentation. We introduce the use of such an image and show an example of a simple segmentation algorithm, which was successfu...
متن کاملFeatures for Neural Net Based Region Identification of Newspaper Documents
Several features for Neural Network based document region identification are tested. Specifically, this paper examines features for non-text region identification. The Neural Network based region identification algorithm is a key component of a document recognition system that segments a document into regions, classifies them into text, graphic, photo, and other region types, and then uses this...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Advanced Engineering Research and Science
سال: 2017
ISSN: 2349-6495,2456-1908
DOI: 10.22161/ijaers.4.5.13